-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Trigger out of band VM state update via libvirt event when VM stops #7963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #7963 +/- ##
============================================
- Coverage 29.16% 27.48% -1.69%
+ Complexity 30377 28203 -2174
============================================
Files 5100 5100
Lines 358273 358325 +52
Branches 52304 52308 +4
============================================
- Hits 104496 98481 -6015
- Misses 239406 246236 +6830
+ Partials 14371 13608 -763
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 398 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
DaanHoogland
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code seems to do what you describe @mlsorensen . To be double clear; this marks VMs as Stopped when they are either stopped from within the client OS, or from the host CLI, leaving other functionality as is.
This is in addition to the migration events you recently submitted, is it?
...ypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/LibvirtComputingResource.java
Show resolved
Hide resolved
It will notify Management Server immediately if VM is stopped within guest OS, or in the event of a crash. As for admin CLI on host, Libvirt events don't seem to be able to make a distinction between CloudStack calling it to destroy a VM vs admin CLI doing something like a |
|
@blueorangutan package |
|
@rohityadavcloud a [SF] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7104 |
DaanHoogland
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code looks good, will do a monkey test
|
Packaging result [LL]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 6185 |
|
@blueorangutan test |
|
@rohityadavcloud a [SF] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
[LL]Trillian test result (tid-6747)
|
|
[SF] Trillian test result (tid-7725)
|
…pache#7963) * Trigger out of band VM state update via libvirt event when VM stops * Add License headers, refactor nested try --------- Co-authored-by: Marcus Sorensen <mls@apple.com> (cherry picked from commit 3694667) Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Description
Pushing this to get some feedback, trying to provide a feature and reusing some functionality like the existing VM power state reporting, rather than inventing whole new messaging.
This PR allows KVM to detect guests that stop or crash, and immediately trigger a power state report to update the VM state in CloudStack.
In current design, Agent is responsible for sending pings on interval. These pings contain a state report for each VM. If something changes in between these pings, it can potentially take a long time for discovery of VM state change.
When Agent first starts, it loads the
ServerResource(implementation isLibvirtComputingResourcein the shipping agent) and initializes it. Then later it calls the ServerResourcegetCurrentStatus()to collect the host and VM status to sendPingCommandon intervals. This change adds two interfaces - if theServerResourceimplementsResourceStatusUpdater, then the agent registers itself as anAgentStatusUpdater, which gives theServerResourcea way to trigger an update, by callingAgentStatusUpdater.triggerUpdate(). This keeps the implementation of monitoring and collecting the VM status in the ServerResource, while allowing theAgentto still handle sending the update, and not requiring all existing implementations ofServerResourceandIAgentControlto implement these by changing the existing interfaces. There may be a cleaner way to do this.In LibvirtComputingResource, we register an event listener, and process domain lifecycle events, looking only for
STOPPEDevents that are due to a crash or a shutdown. Domain stop due to things likevirsh destroy" or cloudstack issuing a stop will have a detail ofDESTROYEDorMIGRATEDin the case of migration, rather thanCRASHEDorSHUTDOWN. I considered briefly adding some code to track if we were in the middle of a StopCommand or similar to filter out superfluous events, this seems simpler, at the expense of not being able to update on adminvirsh destroy.The PingCommand has been given a boolean to indicate if the Ping is out of band. This is important, because the code that processes pings on the management server will ignore pings that come more often than the expected interval (presumably to avoid state thrashing?). This boolean gives us the ability to force processing of pings if they are out of band. Therefore it's also important that we are only triggering these on valid state change events, and not issuing superfluous updates any time a VM stops due to cloudstack issuing stop, etc.
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
Tested locally by shutting down VM within guest, vs
virsh destroyor stopping via CloudStack API. Tested to ensure listener still works after libvirt restart.There are no existing tests for VirtualMachienPowerStateSyncImpl, and the change here is very minor (reacting to the boolean).
It seems tricky to build a unit test for the Libvirt event listener.
Could possibly write a smoke test to ssh into a vm, shut it down, and check the state of the VM via API?